5 research outputs found

    A methodology for selective protection of matrix multiplications: A diagnostic coverage and performance trade-off for CNNs executed on GPUs

    Get PDF
    The ability of CNNs to efficiently and accurately perform complex functions, such as object detection, has fostered their adoption in safety-related autonomous systems. These algorithms require high computational performance platforms that exploit high levels of parallelism. The detection, control and mitigation of random errors in these underlying high computational platforms become a must according to functional safety standards. In this paper, we propose protecting, with a catalog of diagnostic techniques, the most computationally expensive operation of the CNNs, the matrix multiplication. However, this protection entails a performance penalty, and the complete CNN protection may be unaffordable for those systems operating with strict real-time constraints. This paper proposes a three-stage methodology to selectively protect CNN layers to achieve the required diagnostic coverage and performance trade-off: i) sensitivity analysis to misclassification per CNN layers using a statistical fault injection campaign, ii) layer-by- layer performance impact and diagnostic coverage analysis, and iii) selective layer protection. Furthermore, we propose a strategy to effectively compute the achievable diagnostic coverage of large matrices implemented on GPUs. Finally, we apply the proposed methodology and strategy in Tiny YOLO-v3, an object detector based on CNNs.Ikerlan authors have received funding from Elkartek grant project KK-2021/00123 of the Basque government. BSC authors have been partially supported by the Spanish Ministry of Science and Innovation under grant PID2019-107255GBC21/AEI/10.13039/501100011033.Peer ReviewedPostprint (author's final draft

    GPU devices for safety-critical systems: a survey

    Get PDF
    Graphics Processing Unit (GPU) devices and their associated software programming languages and frameworks can deliver the computing performance required to facilitate the development of next-generation high-performance safety-critical systems such as autonomous driving systems. However, the integration of complex, parallel, and computationally demanding software functions with different safety-criticality levels on GPU devices with shared hardware resources contributes to several safety certification challenges. This survey categorizes and provides an overview of research contributions that address GPU devices’ random hardware failures, systematic failures, and independence of execution.This work has been partially supported by the European Research Council with Horizon 2020 (grant agreements No. 772773 and 871465), the Spanish Ministry of Science and Innovation under grant PID2019-107255GB, the HiPEAC Network of Excellence and the Basque Government under grant KK-2019-00035. The Spanish Ministry of Economy and Competitiveness has also partially supported Leonidas Kosmidis with a Juan de la Cierva Incorporación postdoctoral fellowship (FJCI-2020- 045931-I).Peer ReviewedPostprint (author's final draft

    On the safe deployment of matrix multiplication in massively parallel safety-related systems

    Get PDF
    Deep learning technology has enabled the development of increasingly complex safety-related autonomous systems using high-performance computers, such as graphics processing units (GPUs), which provide the required high computing performance for the execution of parallel computing algorithms, such as matrix–matrix multiplications (a central computing element of deep learning software libraries). However, the safety certification of parallel computing software algorithms and GPU-based safety-related systems is a challenge to be addressed. For example, achieving the required fault-tolerance and diagnostic coverage for random hardware errors. This paper contributes with a safe matrix–matrix multiplication software implementation for GPUs with random hardware error-detection capabilities (permanent, transient) that can be used with different architectural patterns for fault-tolerance, and which serves as a foundation for the implementation of safe deep learning libraries for GPUs. The proposed contribution is complementary and can be combined with other techniques, such as algorithm-based fault tolerance. In particular, (i) we provide the high-performance matrix multiplication CUTLASS library with a catalog of diagnostic mechanisms to detect random hardware errors down to the arithmetic operation level; and (ii) we measure the performance impact incurred by the adoption of these mechanisms and their achievable diagnostic coverage with a set of representative matrix dimensions. To that end, we implement these algebraic operations, targeting CUDA cores with single instructions and multiple-thread math instructions in an NVIDIA Xavier NX GPU.The research of this paper has received funding from the European Union’s Horizon 2020 research and innovation programme (grant agreement No 871465 (UP2DATE)).Peer ReviewedPostprint (published version

    Towards functional safety compliance of matrix–matrix multiplication for machine learning-based autonomous systems

    No full text
    Autonomous systems execute complex tasks to perceive the environment and take self-aware decisions with limited human interaction. This autonomy is commonly achieved with the support of machine learning algorithms. The nature of these algorithms, that need to process large data volumes, poses high-performance demands on the underlying hardware. As a result, the embedded critical real-time domain is adopting increasingly powerful processors that combine multi-core processors with accelerators such as GPUs. The resulting hardware and software complexity makes it difficult to demonstrate that the system will run safely and reliably. This is the main objective of functional safety standards, such as IEC 61508 or ISO 26262, that deal with the avoidance, detection and control of hardware or software errors. In this paper, we adopt those measures for the safe inference of machine learning libraries on multi-core devices, two topics that are not explicitly covered in the current version of standards. To this end, we adapt the matrix-matrix multiplication function, a central element of existing machine learning libraries, according to the recommendations of functional safety standards. The paper makes the following contributions: (i) adoption of recommended programming practices for the avoidance of programming errors in the matrix-matrix multiplication, (ii) inclusion of diagnostic mechanisms based on widely used checksums to control runtime errors, and (iii) evaluation of the impact of previous measures in terms of performance and a quantification of the achieved diagnostic coverage. For this purpose, we implement the diagnostic mechanisms on one of the ARM R5 cores of a Zynq UltraScale+ multi-processor system-on-chip and we then adapt them to an Intel i7 processor with native code employing vectorization for the sake of performance.Peer ReviewedPostprint (author's final draft
    corecore